现有的基于视频的人重新识别(REID)的方法主要通过功能提取器和功能聚合器来了解给定行人的外观特征。但是,当不同的行人外观相似时,外观模型将失败。考虑到不同的行人具有不同的步行姿势和身体比例,我们建议学习视频检索的外观功能之外的歧视性姿势功能。具体而言,我们实现了一个两分支的体系结构,以单独学习外观功能和姿势功能,然后将它们串联在一起进行推理。为了学习姿势特征,我们首先通过现成的姿势检测器检测到每个框架中的行人姿势,并使用姿势序列构建时间图。然后,我们利用复发图卷积网络(RGCN)来学习时间姿势图的节点嵌入,该姿势图设计了一种全局信息传播机制,以同时实现框内节点的邻域聚集,并在框架间图之间传递消息。最后,我们提出了一种由节点注意和时间注意的双重意见方法,以从节点嵌入中获得时间图表示,其中采用自我注意机制来了解每个节点和每个帧的重要性。我们在三个基于视频的REID数据集(即火星,Dukemtmc和Ilids-Vid)上验证了所提出的方法,其实验结果表明,学习的姿势功能可以有效地改善现有外观模型的性能。
translated by 谷歌翻译
基于模型的步态识别方法通常采用行人步行姿势来识别人类。但是,由于摄像头视图的改变,现有方法并未明确解决人类姿势的较大阶层差异。在本文中,我们建议通过通过低UPPER生成的对抗网络(Lugan)学习全级转换矩阵来为每个单视姿势样本生成多视图姿势序列。通过摄像机成像的先验,我们得出的是,跨视图之间的空间坐标满足了全级矩阵的线性转换,因此,本文采用了对抗性训练来从源姿势学习转换矩阵,并获得目标视图以获得目标。目标姿势序列。为此,我们实现了由图形卷积(GCN)层组成的发电机,完全连接(FC)层和两支分支卷积(CNN)层:GCN层和FC层编码源姿势序列和目标视图,然后是CNN分支最后,分别学习一个三角形基质和上三角基质,最后它们被乘以制定全级转换矩阵。出于对抗训练的目的,我们进一步设计了一个条件鉴别因子,该条件区分姿势序列是真实的还是产生的。为了启用高级相关性学习,我们提出了一个名为Multi尺度超图卷积(HGC)的插件播放模块,以替换基线中的空间图卷积层,该层可以同时模拟联合级别的部分,部分部分 - 水平和身体水平的相关性。在两个大型步态识别数据集(即CASIA-B和OUMVLP置位)上进行的广泛实验表明,我们的方法的表现优于基线模型,并以一个较大的边距基于基于姿势的方法。
translated by 谷歌翻译
人类智能能够首先学习一些基本技能,以解决基本问题,然后将这种基本技能融合到解决复杂或新问题的复杂技能中。例如,基本技能``挖洞'',``放树,'''``回填''和``浇水'''构成复杂的技能``植物''。此外,可以重复使用一些基本技能来解决其他问题。例如,基本技能``挖洞''不仅可以用于种植树木,而且还可以用于采矿,建造排水管或垃圾填埋场。学习基本技能并重复使用各种任务的能力对人类非常重要,因为它有助于避免学习太多的技能来解决每个任务,并可以通过仅学习几个数量来解决组成数量的任务数量基本技能,可以节省人脑中大量的记忆和计算。我们认为,机器智能还应捕捉学习基本技能并通过构成复杂技能的能力。在计算机科学语言中,每种基本技能都是“模块”,它是一个可重复使用的具体含义的网络,并执行特定的基本操作。将模块组装成更大的``模型'',以完成更复杂的任务。组装过程适应输入或任务,即,对于给定的任务,应该将模块组装成解决任务的最合适的模型中。结果,不同的输入或任务可能具有不同的组装模型,从而实现自组装AI。在这项工作中,我们提出了模块化的自适应神经体系结构搜索(MANAS),以演示上述想法。不同数据集上的实验表明,MANAS组装的自适应体系结构优于静态全局体系结构。进一步的实验和经验分析为魔力的有效性提供了见解。
translated by 谷歌翻译
图表可以表示实体之间的关系信息,图形结构广泛用于许多智能任务,例如搜索,推荐和问题应答。然而,实际上大多数图形结构数据都遭受了不完整性,因此链路预测成为一个重要的研究问题。虽然提出了许多模型来用于链路预测,但以下两个问题仍然仍然较少:(1)大多数方法在不利用相关链路中使用丰富的信息,大多数方法都独立模型,并且(2)现有型号主要基于关联设计学习并没有考虑推理。通过这些问题,在本文中,我们提出了图表协作推理(GCR),它可以使用邻居与逻辑推理视角的关系中的关系推理。我们提供了一种简单的方法来将图形结构转换为逻辑表达式,以便链路预测任务可以转换为神经逻辑推理问题。我们应用逻辑受限的神经模块根据逻辑表达式构建网络架构,并使用反向传播以有效地学习模型参数,这在统一架构中桥接可分辨率的学习和象征性推理。为了展示我们工作的有效性,我们对图形相关任务进行实验,例如基于常用的基准数据集的链路预测和推荐,我们的图表合作推理方法实现了最先进的性能。
translated by 谷歌翻译
Designing experiments often requires balancing between learning about the true treatment effects and earning from allocating more samples to the superior treatment. While optimal algorithms for the Multi-Armed Bandit Problem (MABP) provide allocation policies that optimally balance learning and earning, they tend to be computationally expensive. The Gittins Index (GI) is a solution to the MABP that can simultaneously attain optimality and computationally efficiency goals, and it has been recently used in experiments with Bernoulli and Gaussian rewards. For the first time, we present a modification of the GI rule that can be used in experiments with exponentially-distributed rewards. We report its performance in simulated 2- armed and 3-armed experiments. Compared to traditional non-adaptive designs, our novel GI modified design shows operating characteristics comparable in learning (e.g. statistical power) but substantially better in earning (e.g. direct benefits). This illustrates the potential that designs using a GI approach to allocate participants have to improve participant benefits, increase efficiencies, and reduce experimental costs in adaptive multi-armed experiments with exponential rewards.
translated by 谷歌翻译
Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation-invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art self-supervised learning approaches.
translated by 谷歌翻译
Text clustering and topic extraction are two important tasks in text mining. Usually, these two tasks are performed separately. For topic extraction to facilitate clustering, we can first project texts into a topic space and then perform a clustering algorithm to obtain clusters. To promote topic extraction by clustering, we can first obtain clusters with a clustering algorithm and then extract cluster-specific topics. However, this naive strategy ignores the fact that text clustering and topic extraction are strongly correlated and follow a chicken-and-egg relationship. Performing them separately fails to make them mutually benefit each other to achieve the best overall performance. In this paper, we propose an unsupervised text clustering and topic extraction framework (ClusTop) which integrates text clustering and topic extraction into a unified framework and can achieve high-quality clustering result and extract topics from each cluster simultaneously. Our framework includes four components: enhanced language model training, dimensionality reduction, clustering and topic extraction, where the enhanced language model can be viewed as a bridge between clustering and topic extraction. On one hand, it provides text embeddings with a strong cluster structure which facilitates effective text clustering; on the other hand, it pays high attention on the topic related words for topic extraction because of its self-attention architecture. Moreover, the training of enhanced language model is unsupervised. Experiments on two datasets demonstrate the effectiveness of our framework and provide benchmarks for different model combinations in this framework.
translated by 谷歌翻译
This paper illustrates the technologies of user next intent prediction with a concept knowledge graph. The system has been deployed on the Web at Alipay, serving more than 100 million daily active users. Specifically, we propose AlipayKG to explicitly characterize user intent, which is an offline concept knowledge graph in the Life-Service domain modeling the historical behaviors of users, the rich content interacted by users and the relations between them. We further introduce a Transformer-based model which integrates expert rules from the knowledge graph to infer the online user's next intent. Experimental results demonstrate that the proposed system can effectively enhance the performance of the downstream tasks while retaining explainability.
translated by 谷歌翻译
Capturing feature information effectively is of great importance in vision tasks. With the development of convolutional neural networks (CNNs), concepts like residual connection and multiple scales promote continual performance gains on diverse deep learning vision tasks. However, the existing methods do not organically combined advantages of these valid ideas. In this paper, we propose a novel CNN architecture called GoogLe2Net, it consists of residual feature-reutilization inceptions (ResFRI) or split residual feature-reutilization inceptions (Split-ResFRI) which create transverse passages between adjacent groups of convolutional layers to enable features flow to latter processing branches and possess residual connections to better process information. Our GoogLe2Net is able to reutilize information captured by foregoing groups of convolutional layers and express multi-scale features at a fine-grained level, which improves performances in image classification. And the inception we proposed could be embedded into inception-like networks directly without any migration costs. Moreover, in experiments based on popular vision datasets, such as CIFAR10 (97.94%), CIFAR100 (85.91%) and Tiny Imagenet (70.54%), we obtain better results on image classification task compared with other modern models.
translated by 谷歌翻译
Despite some successful applications of goal-driven navigation, existing deep reinforcement learning-based approaches notoriously suffers from poor data efficiency issue. One of the reasons is that the goal information is decoupled from the perception module and directly introduced as a condition of decision-making, resulting in the goal-irrelevant features of the scene representation playing an adversary role during the learning process. In light of this, we present a novel Goal-guided Transformer-enabled reinforcement learning (GTRL) approach by considering the physical goal states as an input of the scene encoder for guiding the scene representation to couple with the goal information and realizing efficient autonomous navigation. More specifically, we propose a novel variant of the Vision Transformer as the backbone of the perception system, namely Goal-guided Transformer (GoT), and pre-train it with expert priors to boost the data efficiency. Subsequently, a reinforcement learning algorithm is instantiated for the decision-making system, taking the goal-oriented scene representation from the GoT as the input and generating decision commands. As a result, our approach motivates the scene representation to concentrate mainly on goal-relevant features, which substantially enhances the data efficiency of the DRL learning process, leading to superior navigation performance. Both simulation and real-world experimental results manifest the superiority of our approach in terms of data efficiency, performance, robustness, and sim-to-real generalization, compared with other state-of-art baselines. Demonstration videos are available at \colorb{https://youtu.be/93LGlGvaN0c.
translated by 谷歌翻译